AITopics | reward and transition probability

Collaborating Authors

reward and transition probability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

a724b9124acc7b5058ed75a31a9c2919-AuthorFeedback.pdf

Neural Information Processing SystemsAug-19-2025, 22:49:20 GMT

algorithm, dependence, reward and transition probability, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.33)

Add feedback

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Neural Information Processing SystemsMar-14-2024, 21:32:05 GMT

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Hölder continuity of rewards and transition probabilities.

reinforcement, reward and transition probability, transition probability, (13 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Styria > Leoben (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)

Add feedback

Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning

Gupta, Piyush, Srivastava, Vaibhav

arXiv.org Artificial IntelligenceDec-19-2022

We propose Deterministic Sequencing of Exploration and Exploitation (DSEE) algorithm with interleaving exploration and exploitation epochs for model-based RL problems that aim to simultaneously learn the system model, i.e., a Markov decision process (MDP), and the associated optimal policy. During exploration, DSEE explores the environment and updates the estimates for expected reward and transition probabilities. During exploitation, the latest estimates of the expected reward and transition probabilities are used to obtain a robust policy with high probability. We design the lengths of the exploration and exploitation epochs such that the cumulative regret grows as a sub-linear function of time.

data mining, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2209.05408

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Variational Regret Bounds for Reinforcement Learning

Gajane, Pratik, Ortner, Ronald, Auer, Peter

arXiv.org Machine LearningMay-23-2019

For this For reinforcement learning in MDP with changes in reward problem setting, we propose an algorithm and function and transition probabilities, we provide provide performance guarantees for the regret an algorithm, UCRL with Restarts, a version of UCRL evaluated against the optimal non-stationary [Jaksch et al., 2010], which restarts according to a schedule policy. The upper bound on the regret is given dependent on the variation in the MDP (defined in terms of the total variation in the MDP. in Section 2 below). We derive a high-probability upper This is the first variational regret bound for the bound on the cumulative regret of our algorithm of general reinforcement learning setting.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1905.05857

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.81)

Add feedback

Convergence of a Q-learning Variant for Continuous States and Actions

Carden, S. W.

Journal of Artificial Intelligence ResearchApr-29-2014

This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision Processes under the expected total discounted reward criterion when both the state and action spaces are continuous. This algorithm is based on Watkins' Q-learning, but uses Nadaraya-Watson kernel smoothing to generalize knowledge to unvisited states. As expected, continuity conditions must be imposed on the mean rewards and transition probabilities. Using results from kernel regression theory, this algorithm is proven capable of producing a Q-value function estimate that is uniformly within an arbitrary tolerance of the true Q-value function with probability one. The algorithm is then applied to an example problem to empirically show convergence as well.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4271

AI Access Foundation

10876

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Ohio (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Ortner, Ronald, Ryabko, Daniil

Neural Information Processing SystemsDec-31-2012

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Hoelder continuity of rewards and transition probabilities.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Europe (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.74)

Add feedback

Regret Bounds for Restless Markov Bandits

Ortner, Ronald, Ryabko, Daniil, Auer, Peter, Munos, Rémi

arXiv.org Machine LearningSep-12-2012

We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ steps achieves $\tilde{O}(\sqrt{T})$ regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addition, we show that index-based policies are necessarily suboptimal for the considered problem.

artificial intelligence, machine learning, probability, (17 more...)

arXiv.org Machine Learning

1209.2693

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback